Visual Reasoning


Systematic Abductive Reasoning via Diverse Relation Representations in Vector-symbolic Architecture

Add code
Jan 21, 2025
Viaarxiv icon

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

Add code
Jan 21, 2025
Viaarxiv icon

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Add code
Jan 21, 2025
Viaarxiv icon

GLAM: Global-Local Variation Awareness in Mamba-based World Model

Add code
Jan 21, 2025
Viaarxiv icon

Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization

Add code
Jan 21, 2025
Viaarxiv icon

EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery

Add code
Jan 20, 2025
Figure 1 for EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery
Figure 2 for EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery
Figure 3 for EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery
Figure 4 for EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery
Viaarxiv icon

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Add code
Jan 21, 2025
Figure 1 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 2 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 3 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 4 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Viaarxiv icon

MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science

Add code
Jan 18, 2025
Viaarxiv icon

Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!

Add code
Jan 18, 2025
Viaarxiv icon

Visual RAG: Expanding MLLM visual knowledge without fine-tuning

Add code
Jan 18, 2025
Viaarxiv icon